1
National Research University Higher School of Economics, 20 Myasnitskaya Str., Moscow, 101000, Russian Federation
2
Institute of Communications and Media Business, Moscow State University of Printing Arts. , 2a, Pryanishnikova street, Moscow, 127550, Russian Federation.
Maxim I. Kolomeychenko - Graduate of MSc Program, National Research University Higher School of Economics; Address: 20, Myasnitskaya street, Moscow, 101000, Russian Federation. E-mail: maxim.kolomeychenko@mail.ru
Andrey M. Chepovskiy - Associate Professor, Department of Information Security Management, Faculty of Business Informatics, National Research University Higher School of Economics; Professor, Department of Applied Mathematics and Systems Modeling, Institute of Communications and Media Business, Moscow State University of Printing Arts. Address: 2a, Pryanishnikova street, Moscow, 127550, Russian Federation. E-mail: achepovskiy@hse.ru
The problem of huge graph visualization arises in various fields of sociology and marketing. The relevance of this work is determined by the need for the software package for analysis and visualization of such graphs. This paper introduces summary analysis of several software products and highlights their weaknesses: lack of cross-platform and specialized graph warehouses, inability to deal with huge graphs. This paper presents a detailed description of the overall architecture of the software developed and each of its modules separately, as well as a procedure for communication between core modules. A special graph warehouse capable to process graphs having up to 100 million vertices and billions of links is used to store graphs. A description of main data warehousing principles is also introduced. The use of a proprietary file system ensures the absence of additional system calls when working with the warehouse and the lack of a complex addressing system and any excessive mechanisms, that enables to avoid any extra overhead costs associated with data warehousing. Furthermore, there is a description of a methodology for data visualization module, used data structures and computer graphics algorithms, that enable to handle graphs comprising up to millions of vertices on real time basis. It is worth noting a wide range of algorithms for auto layout of graphs such as random layout, circular layout, circular component layout, peacock tail layout, one or two themes layout, layout based on community detection and estimation of cohesion. This paper presents a detailed description of each of the layout mentioned above. Special attention should be paid to the developed methods of graph analysis. Algorithms have been designed to detect communities in social networks, to evaluate graph cohesion, to find the shortest paths between any pair of vertices, graphs union and intersection, etc. A key feature of all algorithms presented in this paper is their ability to handle huge graphs.
Citation:
Kolomeychenko M., Chepovskiy A. (2014) Vizualizatsiya i analiz grafov bol'shikh razmerov [Huge graph visualization and analysis]. Biznes-informatika, no 4 (30), pp. 7-16 (in Russian)